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The 

Compiler 
So Far 



Lexical analysis 

Detects inputs with illegal tokens 



• Parsing 

Detects inputs with ill-formed parse trees 



• Semantic analysis 

Last “front end” phase 
Catches all remaining errors 
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Parsing cannot catch some errors 



Why a 
Separate 
Semantic 
Analysis? 



Some language constructs not context-free 
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What Does 
Semantic 
Analysis 
Do? 



Checks of many kinds . . . : 

All identifiers are declared 
Types 

Inheritance relationships 
Classes defined only once 
Methods in a class defined only once 
Reserved identifiers are not misused 
And others . . . 




The requirements depend on the language 
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Run-time Environments 
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Status 



We have covered the front-end phases 

• Lexical analysis 

• Parsing 

• Semantic analysis 

• Next are the back-end phases 

• Optimization 

• Code generation 

Start with code generation first . . . 
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Run-time 

environments 



Before discussing code generation, we need to 
understand what we are trying to generate 



There are a number of standard techniques for 
structuring executable code that are widely used 
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Main Points 



Management of run-time resources 

Correspondence between 
static (compile-time) and 

dynamic (run-time) structures 
• Storage organization 
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mm 

Run-time 

Resources 



Execution of a program is initially under the control of the 
operating system 

When a program is invoked: 

• The OS allocates space for the program 

• The code is loaded into part of the space 

The OS jumps to the entry point (i.e., “main” ) 
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Memory 



Low Address 



Memory 

Layout 



Code 



Other Space 



High Address 
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Notes 




By tradition, pictures of machine organization have: 
Low address at the top 

• High address at the bottom 
Lines delimiting areas for different kinds of data 



These pictures are simplifications 

E.g., not all memory need be contiguous 
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What is 
Other 
Space? 



Holds all data forthe program 
Other Space = Data Space 



Compiler is responsible for: 

• Generating code 
Orchestrating use of the data area 
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Code 

Generation 

Goals 



Two goals: 

Correctness 

Speed 



Most complications in code generation come from 
trying to be fast as well as correct 
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Assumptions 

about 

Execution 



Execution is sequential; control moves from one 
point in a program to another in a well-defined 
order 



When a procedure is called, control eventually 
returns to the point immediately after the call 



Do these assumptions always hold? 
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Activations 



An invocation of procedure P is an activation of P 

The lifetime of an activation of P is 
• All the steps to execute P 
Including all the steps in procedures P calls 
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Lifetimes of 
Variables 



The lifetime of a variable x is the portion of execution 
in which x is defined 

• Note that 

Lifetime is a dynamic (run-time) concept 
• Scope is a static concept 
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Activation 

Trees 



Assumption (2) requires that when P calls Q, 
then Q returns before P does 



Lifetimes of procedure activations are properly 
nested 



Activation lifetimes can be represented as a tree 
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Example 



Class Main { 
g() : Int £ 1 }; 
f(): Int { g() 

mainQ: Int {{ g(); f(); }}; 



Main 




9 
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Class Main { 
g(): lnt{i}; 

f(x: I nt): Int { if x = o then g() else f(x - 1) fi}; 
main(): Int {{f(3); }}; 

Example 2 

What is the activation tree for this example? 
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The activation tree depends on run-time behavior 



• The activation tree may be different for every 

Notes program input 



Since activations are properly nested, a stack can 
track currently active procedures 



Dr. Sherin ElGokhy 



Example 



Class Main { 
g() : Int { i }; 
f(): Int { g() I; 
mainQ: Int {{ g(); f(); }}; 



Main 



Stack 

Main 
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Example 



Class Main { 
g() : Int { i }; 
f(): Int { g() I; 
mainQ: Int {{ g(); f(); }}; 




Stack 

Main 

9 
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Example 



Class Main { 
g() : Int { 1 1; 
f(): Int { g() }; 
mainQ: Int {{ g(); f(); }}; 




Stack 

Main 

f 
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Example 



Class Main { 
g() : Int { 1 1; 
f(): Int { g() I; 
mainQ: Int {{ g(); f(); }}; 




Stack 

Main 

f 

9 
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Memory 



Revised 

Memory 

Layout 



Code 



Stack 

I 



Low Address 



High Address 
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Activation 

Records 



The information needed to manage one procedure 
activation is called an activation record (AR) or frame 



If procedure F calls G, then G’ s activation record 
contains a mix of info about F and G. 
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F is “suspended” until G completes, at which point F 
resumes. G’ s 



What is in 
sAR 
when 
calls 




AR contains information needed to 
Complete the execution of G 
resume execution of F. 

G’ sAR may also contain: 

G’ s return value (needed by F) 

Actual parameters to G (supplied by F) 
Space forG’ s local variables 
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The 



Contents of 
a Typical AR 




• Space for G’ s return value 

• Actual parameters 

Pointer to the previous activation record 
The control link; points to AR of caller of G 

Machine status priorto calling G 

Contents of registers & program counter 
Local variables 

Other temporary values 
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Example 2, 
Revisited 



Class Main { 
g() : Int { i }; 

f(x:lnt):lnt {if x=o then g() else f(x - i)(**)fi}; 
mainQ: Int {{f(3); (*) 




ARfor : 



result 
argument 
control link 
return address 
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Stack After 
Two Calls t 




am 



f 



(result) 



n 

(result) 
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Notes 



Main has no argument or local variables and its 
result is never used; its AR is uninteresting 

(*) and (**) are return addresses of the invocations 
off 

The return address is where execution resumes after 
a procedure call finishes 



This is only one of many possible AR designs 
Would also work for C, Pascal, FORTRAN, etc. 
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The Main 
Point 



The compiler must determine, at compile-time, 
the layout of activation records and generate 
code that correctly accesses locations in the 

activation record 



Thus, theAR layout and the code generator must be 

designed together! 
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The picture shows the state after the call to the 2nd 
invocation of f returns 



Example 




The advantage of placing the return value ist in a 
frame is that the caller can find it at a fixed offset from 
its own frame 



Discussion 



There is nothing magic about this organization 
Can rearrange order of frame elements 
Can divide caller/callee responsibilities differently 

An organization is better if it improves execution speed 
or simplifies code generation 
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Globals 



All references to a global variable point to the same 
object 

Can’ t store a global in an activation record 



Globals are assigned a fixed address once 

Variables with fixed address are “statically allocated” 

Depending on the language, there may be other 
statically allocated values 
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Memory 
Layout 
with Static 
Data 



Memory 




Low Address 



High Address 
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A value that outlives the procedure that creates it 
cannot be kept in the AR 

method foo() { new Bar } 

The Bar value must survive deallocation of foo’ s AR 



Languages with dynamically allocated data use a heap 
to store dynamic data 
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Notes 



The code area contains object code 

For most languages, fixed size and read only 

The static area contains data (not code) with fixed 
addresses (e.g., global data) 

Fixed size, may be readable or writable 

The stack contains an AR for each currently active 
procedure 

Each AR usually fixed size, contains locals 



• Heap contains all other data 

In C, heap is managed by malloc and free 
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Notes 

(Cont.) 



Both the heap and the stack grow 



Must take care that they don’ t grow into each other 



Solution: start heap and stack at opposite ends of 
memory and let them grow towards each other 
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Memory 



Memory 
Layout 
with Heap 




Low Address 



High Address 
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Data 

Layout 



Low-level details of machine architecture are 
important in laying out data for correct code and 
maximum performance 



Chief among these concerns is alignment 
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Alignment 



Most modern machines are either 32 bit or 64 bit 
8 bits in a byte 
4 or 8 bytes in a word 

Machines are either byte or word addressable 
Data is word aligned if it begins at a word boundary 

Most machines have some alignment restrictions 
Or performance penalties for poor alignment 
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Alignment 

(Cont.) 



Example: A string 



“Hello 






Takes 5 characters (without a terminating \o) 



To word align next datum, add 3 “padding” 
characters to the string 



The padding is not part of the string, it’ s just unused 
memory 
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Stack 

Machines 



A simple evaluation model 
Only storage is the stack 
No variables or registers 
A stack of values for intermediate results 
Each instruction: 

Takes its operands from the top of the stack 
Removes those operands from the stack 
Computes the required operation on them 
Pushes the result on the stack 
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Example of Stack Machine Operation 



• The addition operation on a stack machine 




pop add push 
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Example 
of a Stack 
Machine 
Program 



• Consider two instructions 

• push i - place the integer i on top of the stack 

• add - pop two elements, add them and put 

the result back on the stack 

• A program to compute 7 + 5: 

push 7 
push 5 
add 
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Why Use a 
Stack 
Machine ? 



Each operation takes operands from the same 
place and puts results in the same place 



This means a uniform compilation scheme 



And therefore a simpler compiler 



Dr. Sherin ElGokhy 



Why Use a 
Stack 
Machine ? 



Location of the operands is implicit 
Always on the top of the stack 

No need to specify operands explicitly 

No need to specify the location of the result 

Instruction “add as opposed to "add r 1# r 2 ” 

=^> Smaller encoding of instructions 
=> More compact programs 

This is one reason why Java Bytecodes use a stack 
evaluation model 
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Optimizing 
the Stack 
Machine 



The add instruction does 3 memory operations 
Two reads and one write to the stack 
The top of the stack is frequently accessed 

Idea: keep the top of the stack in a register 
(called accumulator) 

Register accesses are faster 

•The 'add instruction is now 

acc <— acc + top_of_stack 

Only one memory operation! 
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Stack 

Machine 

with 

Accumulator 



Invariants 

The result of an expression is in the accumulator 

For op(e 1/ ... / e n ) push the accumulator on the stack 
after computing e 1/ ... / e n _ 1 

• After the operation pops n-i values 
Expression evaluation preserves the stack 
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Stack Machine with Accumulator. Example 



• Compute 7 + 5 using an accumulator 



acc 



stack 





acc <- 5 acc acc + top_of_stack 

pop 



acc <- 7 
push acc 
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A Bigger Example: 3 + (7 + 5) 



Code 


Acc 


Stack 


acc <- 3 


3 


<init> 


push acc 


3 


3, <init> 


acc <- 7 


7 


3, <init> 


push acc 


7 


7, 3, <init> 


acc <- 5 


5 


7, 3, <ini+> 


acc <r- acc + top_of_stack 


12 


7, 3, <init> 


pop 


12 


3, <init> 


acc <- acc + top_of_stack 


15 


3, <init> 


pop 


15 


<init> 
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Notes 



It is very important evaluation of a subexpression 
preserves the stack 

Stack before the evaluation of 7 + 5 is 3, <init> 
Stack after the evaluation of 7 + 5 is 3, <init> 

• The first operand is on top of the stack 
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I hanks 



